Search results for "Patient identification"
showing 4 items of 4 documents
Chaînage de bases de données anonymisées pour les études épidémiologiques multicentriques nationales et internationales : proposition d'un algorithme…
2009
Background: Compiling individual records coming from different sources is very important for multicenter epidemiological studies; however, European directives and other national legislation concerning nominal data processing must be respected. These legal aspects can be satisfied by implementing mechanisms that allow anonymization of patient data (such as hashing techniques). Moreover, for security reasons, official recommendations suggest using different cryptographic keys in combination with a cryptographic hash function for each study. Unfortunately, this type of anonymization procedure is in contradiction with common requirements in public health and biomedical research because it becom…
An optimal code for patient identifiers.
2004
How to distinguish 1 billion individuals by an identifier consisting of eight characters, allowing a reasonable amount of error detection or even error correction? Our solution of this problem is an optimal code over a 32-character alphabet that detects up to two errors and corrects one error as well as a transposition of two adjacent characters. The corresponding encoding and error checking algorithms are available for free; they are also embedded as components of the pseudonymisation service that is used in the TMF-the German telematics platform for health research networks.
Bagging, bumping, multiview, and active learning for record linkage with empirical results on patient identity data
2011
Record linkage or deduplication deals with the detection and deletion of duplicates in and across files. For this task, this paper introduces and evaluates two new machine-learning methods (bumping and multiview) together with bagging, a tree-based ensemble-approach. Whereas bumping represents a tree-based approach as well, multiview is based on the combination of different methods and the semi-supervised learning principle. After providing a theoretical background of the methods, initial empirical results on patient identity data are given. In the empirical evaluation, we calibrate the methods on three different kinds of training data. The results show that the smallest training data set, …
Combining hashing and enciphering algorithms for epidemiological analysis of gathered data.
2008
Summary Objectives: Compiling individual records coming from different sources is necessary for multi-center studies. Legal aspects can be satisfied by implementing anonymization procedures. When using these procedures with a different key for each study it becomes almost impossible to link records from separate data collections. Methods: The originality of the method relies on the way the combination of hashing and enciphering techniques is performed: like in asymmetric encryption, two keys are used but the private key depends on the patient’s identity. Results: The combination of hashing and enciphering techniques provides a great improvement in the overall security of the proposed scheme…